智能论文笔记

MLO: Multi-Object Tracking and Lidar Odometry in Dynamic Environment

Tingchen Ma , Yongsheng Ou

分类：机器人

2022-04-25

当视野中有许多移动对象时，基于静态场景假设的SLAM系统会引入重大估计错误。跟踪和维护语义对象有益于场景理解，并为计划和控制模块提供丰富的决策信息。本文介绍了MLO，这是一种多对象的激光雷达探光仪，该镜像仅使用激光雷达传感器跟踪自我运动和语义对象。为了实现对多个对象的准确和强大的跟踪，我们提出了一个最小二乘估计器，该估计器融合了3D边界框和几何点云，用于对象状态更新。通过分析跟踪列表中的对象运动状态，映射模块使用静态对象和环境特征来消除累积错误。同时，它在MAP坐标中提供了连续的对象轨迹。我们的方法在公共Kitti数据集的不同情况下进行了定性和定量评估。实验结果表明，在高度动态，非结构化和未知的语义场景中，MLO的自我定位精度比最先进的系统更好。同时，与基于滤波的方法相比，具有语义几何融合的多目标跟踪方法在跟踪准确性和一致性方面也具有明显的优势。

translated by 谷歌翻译

A Concept Knowledge Graph for User Next Intent Prediction at Alipay

Yacheng He , Qianghuai Jia , Lin Yuan , Ruopeng Li , Yixin Ou , Ningyu Zhang

分类：自然语言处理 | 人工智能 | 机器学习

2023-01-02

This paper illustrates the technologies of user next intent prediction with a concept knowledge graph. The system has been deployed on the Web at Alipay, serving more than 100 million daily active users. Specifically, we propose AlipayKG to explicitly characterize user intent, which is an offline concept knowledge graph in the Life-Service domain modeling the historical behaviors of users, the rich content interacted by users and the relations between them. We further introduce a Transformer-based model which integrates expert rules from the knowledge graph to infer the online user's next intent. Experimental results demonstrate that the proposed system can effectively enhance the performance of the downstream tasks while retaining explainability.

translated by 谷歌翻译

Differentiable Search of Accurate and Robust Architectures

Yuwei Ou , Xiangning Xie , Shangce Gao , Yanan Sun , Kay Chen Tan , Jiancheng Lv

分类：机器学习 | 人工智能

2022-12-28

Deep neural networks (DNNs) are found to be vulnerable to adversarial attacks, and various methods have been proposed for the defense. Among these methods, adversarial training has been drawing increasing attention because of its simplicity and effectiveness. However, the performance of the adversarial training is greatly limited by the architectures of target DNNs, which often makes the resulting DNNs with poor accuracy and unsatisfactory robustness. To address this problem, we propose DSARA to automatically search for the neural architectures that are accurate and robust after adversarial training. In particular, we design a novel cell-based search space specially for adversarial training, which improves the accuracy and the robustness upper bound of the searched architectures by carefully designing the placement of the cells and the proportional relationship of the filter numbers. Then we propose a two-stage search strategy to search for both accurate and robust neural architectures. At the first stage, the architecture parameters are optimized to minimize the adversarial loss, which makes full use of the effectiveness of the adversarial training in enhancing the robustness. At the second stage, the architecture parameters are optimized to minimize both the natural loss and the adversarial loss utilizing the proposed multi-objective adversarial training method, so that the searched neural architectures are both accurate and robust. We evaluate the proposed algorithm under natural data and various adversarial attacks, which reveals the superiority of the proposed method in terms of both accurate and robust architectures. We also conclude that accurate and robust neural architectures tend to deploy very different structures near the input and the output, which has great practical significance on both hand-crafting and automatically designing of accurate and robust neural architectures.

translated by 谷歌翻译

Towards Efficient Visual Simplification of Computational Graphs in Deep Neural Networks

Rusheng Pan , Zhiyong Wang , Yating Wei , Han Gao , Gongchang Ou , Caleb Chen Cao , Jingli Xu , Tong Xu , Wei Chen

分类：人工智能 | 机器学习

2022-12-21

A computational graph in a deep neural network (DNN) denotes a specific data flow diagram (DFD) composed of many tensors and operators. Existing toolkits for visualizing computational graphs are not applicable when the structure is highly complicated and large-scale (e.g., BERT [1]). To address this problem, we propose leveraging a suite of visual simplification techniques, including a cycle-removing method, a module-based edge-pruning algorithm, and an isomorphic subgraph stacking strategy. We design and implement an interactive visualization system that is suitable for computational graphs with up to 10 thousand elements. Experimental results and usage scenarios demonstrate that our tool reduces 60% elements on average and hence enhances the performance for recognizing and diagnosing DNN models. Our contributions are integrated into an open-source DNN visualization toolkit, namely, MindInsight [2].

translated by 谷歌翻译

Reasoning with Language Model Prompting: A Survey

Shuofei Qiao , Yixin Ou , Ningyu Zhang , Xiang Chen , Yunzhi Yao , Shumin Deng , Chuanqi Tan , Fei Huang , Huajun Chen

分类：自然语言处理 | 人工智能 | 计算机视觉 | 机器学习

2022-12-19

Reasoning, as an essential ability for complex problem-solving, can provide back-end support for various real-world applications, such as medical diagnosis, negotiation, etc. This paper provides a comprehensive survey of cutting-edge research on reasoning with language model prompting. We introduce research works with comparisons and summaries and provide systematic resources to help beginners. We also discuss the potential reasons for emerging such reasoning abilities and highlight future research directions.

translated by 谷歌翻译

MIGA: A Unified Multi-task Generation Framework for Conversational Text-to-SQL

Yingwen Fu , Wenjie Ou , Zhou Yu , Yue Lin

分类：自然语言处理 | 人工智能

2022-12-19

Conversational text-to-SQL is designed to translate multi-turn natural language questions into their corresponding SQL queries. Most state-of-the-art conversational text- to-SQL methods are incompatible with generative pre-trained language models (PLMs), such as T5. In this paper, we present a two-stage unified MultI-task Generation frAmework (MIGA) that leverages PLMs' ability to tackle conversational text-to-SQL. In the pre-training stage, MIGA first decomposes the main task into several related sub-tasks and then unifies them into the same sequence-to-sequence (Seq2Seq) paradigm with task-specific natural language prompts to boost the main task from multi-task training. Later in the fine-tuning stage, we propose four SQL perturbations to alleviate the error propagation problem. MIGA tends to achieve state-of-the-art performance on two benchmarks (SparC and CoSQL). We also provide extensive analyses and discussions to shed light on some new perspectives for conversational text-to-SQL.

translated by 谷歌翻译

Segmentation Ability Map: Interpret deep features for medical image segmentation

Sheng He , Yanfang Feng , P. Ellen Grant , Yangming Ou

分类：计算机视觉

2022-12-19

Deep convolutional neural networks (CNNs) have been widely used for medical image segmentation. In most studies, only the output layer is exploited to compute the final segmentation results and the hidden representations of the deep learned features have not been well understood. In this paper, we propose a prototype segmentation (ProtoSeg) method to compute a binary segmentation map based on deep features. We measure the segmentation abilities of the features by computing the Dice between the feature segmentation map and ground-truth, named as the segmentation ability score (SA score for short). The corresponding SA score can quantify the segmentation abilities of deep features in different layers and units to understand the deep neural networks for segmentation. In addition, our method can provide a mean SA score which can give a performance estimation of the output on the test images without ground-truth. Finally, we use the proposed ProtoSeg method to compute the segmentation map directly on input images to further understand the segmentation ability of each input image. Results are presented on segmenting tumors in brain MRI, lesions in skin images, COVID-related abnormality in CT images, prostate segmentation in abdominal MRI, and pancreatic mass segmentation in CT images. Our method can provide new insights for interpreting and explainable AI systems for medical image segmentation. Our code is available on: \url{https://github.com/shengfly/ProtoSeg}.

translated by 谷歌翻译

Modeling Global Distribution for Federated Learning with Label Distribution Skew

Tao Sheng , Chengchao Shen , Yuan Liu , Yeyu Ou , Zhe Qu , Jianxin Wang

分类：机器学习 | 计算机视觉

2022-12-17

Federated learning achieves joint training of deep models by connecting decentralized data sources, which can significantly mitigate the risk of privacy leakage. However, in a more general case, the distributions of labels among clients are different, called ``label distribution skew''. Directly applying conventional federated learning without consideration of label distribution skew issue significantly hurts the performance of the global model. To this end, we propose a novel federated learning method, named FedMGD, to alleviate the performance degradation caused by the label distribution skew issue. It introduces a global Generative Adversarial Network to model the global data distribution without access to local datasets, so the global model can be trained using the global information of data distribution without privacy leakage. The experimental results demonstrate that our proposed method significantly outperforms the state-of-the-art on several public benchmarks. Code is available at \url{https://github.com/Sheng-T/FedMGD}.

translated by 谷歌翻译

A Light-Weight LiDAR-Inertial SLAM System with Loop Closing

Kangcheng Liu , Huosen Ou

分类：机器人

2022-12-12

In this work, we propose a lightweight integrated LiDAR-Inertial SLAM system with high efficiency and a great loop closure capacity. We found that the current State-of-the-art LiDAR-Inertial SLAM system has poor performance in loop closure. The LiDAR-Inertial SLAM system often fails with the large drifting and suffers from limited efficiency when faced with large-scale circumstances. In this work, firstly, to improve the speed of the whole LiDAR-Inertial SLAM system, we have proposed a new data structure of the sparse voxel-hashing to enhance the efficiency of the LiDAR-Inertial SLAM system. Secondly, to improve the point cloud-based localization performance, we have integrated the loop closure algorithms to improve the localization performance. Extensive experiments on the real-scene large-scale complicated circumstances demonstrate the great effectiveness and robustness of the proposed LiDAR-Inertial SLAM system.

translated by 谷歌翻译

Towards Next Generation of Pedestrian and Connected Vehicle In-the-loop Research: A Digital Twin Simulation Framework

Zijin Wang , Ou Zheng , Liangding Li , Mohamed Abdel-Aty , Carolina Cruz-Neira , Zubayer Islam

分类：机器人

2022-12-08

Digital Twin is an emerging technology that replicates real-world entities into a digital space. It has attracted increasing attention in the transportation field and many researchers are exploring its future applications in the development of Intelligent Transportation System (ITS) technologies. Connected vehicles (CVs) and pedestrians are among the major traffic participants in ITS. However, the usage of Digital Twin in research involving both CV and pedestrian remains largely unexplored. In this study, a Digital Twin framework for CV and pedestrian in-the-loop simulation is proposed. The proposed framework consists of the physical world, the digital world, and data transmission in between. The features for the entities (CV and pedestrian) that need digital twined are divided into external state and internal state, and the attributes in each state are described. We also demonstrate a sample architecture under the proposed Digital Twin framework, which is based on Carla-Sumo Co-simulation and Cave automatic virtual environment (CAVE). The proposed framework is expected to provide guidance to the future Digital Twin research, and the architecture we build can serve as the testbed for further research and development of ITS applications on CV and pedestrian.

translated by 谷歌翻译